Systematic Interrelations Between Grapheme Frequencies and Word Length: Empirical Evidence from Slovene

نویسنده

  • Emmerich Kelih
چکیده

This paper focuses on the question whether grapheme frequencies are in a direct relationship to word length. In other words, a possible interrelation between the frequency of graphemes and the length of linguistic units is discussed. Based on different Slovene text types it is shown that the Altmann-Menzerath law is an adequate theoretical explanation for the supposed interrelation between grapheme frequencies and the word length. Furthermore a linguistic interpretation of parameters of grapheme frequency models is offered.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Towards a General Model of Grapheme Frequencies for Slavic Languages

The present study discusses a possible theoretical model for grapheme frequencies of Slavic alphabets. Based on previous research on Slovene, Russian, and Slovak grapheme frequencies, the negative hypergeometric distribution is presented as a model, adequate for various Slavic languages. Additionally, arguments are provided in favor of the assumption that the parameters of this model can be int...

متن کامل

Towards automatic speech recognition without pronunciation dictionary, transcribed speech and text resources in the target language using cross-lingual word-to-phoneme alignment

In this paper we tackle the task of bootstrapping an Automatic Speech Recognition system without an a priori given language model, a pronunciation dictionary, or transcribed speech data for the target language Slovene – only untranscribed speech and translations to other resource-rich source languages of what was said are available. Therefore, our approach is highly relevant for under-resourced...

متن کامل

The grapho-phonological system of written French: Statistical analysis and empirical validation

The processes through which readers evoke mental representations of phonological forms from print constitute a hotly debated and controversial issue in current psycholinguistics. In this paper we present a computational analysis of the grapho-phonological system of written French, and an empirical validation of some of the obtained descriptive statistics. The results provide direct evidence dem...

متن کامل

Parameter interpretation of the Menzerath law: evidence from Serbian

The law-like relation between word and syllable length as part of the Menzerath law has been corrobated empirically in many different languages. As to South Slavic languages, we have the studies by Gajić (1950) and Grzybek (1999) on Croatian, and by Grzybek (2000) on Slovene. The aim of the present paper is first of all to provide empirical evidence of the Menzerath law for another South Slavic...

متن کامل

Evaluating the Noisy Channel Model for the Normalization of Historical Texts: Basque, Spanish and Slovene

This paper presents a method for the normalization of historical texts using a combination of weighted finite-state transducers and language models. We have extended our previous work on the normalization of dialectal texts and tested the method against a 17th century literary work in Basque. This preprocessed corpus is made available in the LREC repository. The performance of this (semi-)super...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Journal of Quantitative Linguistics

دوره 19  شماره 

صفحات  -

تاریخ انتشار 2012